Observing cognitive processes in time through functional MRI model comparison

Abstract The temporal specificity of functional magnetic resonance imaging (fMRI) is limited by a sluggish and locally variable hemodynamic response trailing the neural activity by seconds. Here, we demonstrate for an attention capture paradigm that it is, never the less, possible to extract information about the relative timing of regional brain activity during cognitive processes on the scale of 100 ms by comparing alternative signal models representing early versus late activation. We demonstrate that model selection is not driven by confounding regional differences in hemodynamic delay. We show, including replication, that the activity in the dorsal anterior insula is an early signal predictive of behavioral performance, while amygdala and ventral anterior insula signals are not. This specific finding provides new insights into how the brain assigns salience to stimuli and emphasizes the role of the dorsal anterior insula in this context. The general analytic approach, named “Cognitive Timing through Model Comparison” (CTMC), offers an exciting and novel method to identify functional brain subunits and their causal interactions.


Supplementary Description of Modelling Approach
As the aim of our analysis is the identification of brain regions responsible for the attribution of stimulus salience, the Salience Attribution model is of primary importance for this work. In the main paper, we instigate that brain regions central to the cognitive process of Salience Attribution to a distractor should satisfy three conditions, which we shall explain in more detail here: 1. Regions for Salience Attribution should be able to separate negative from neutral distractors. Our theoretical framework of attention capture proclaims that distractor images that are rated negative with respect to their emotional content by a general population are more distracting than neutral images. The emotional category ('negative' versus 'neutral') is regarded crucial for such increased distractibility of particular stimuli. The prolonged RTs for negative distractors in our ACES task (see Table 1) are the behavioral evidence for these assumptions. Therefore, we can conclude that any region that cannot distinguish between these categories of stimuli (i.e., emotional content) cannot be central to the salience attribution process. Notably, additional regions may cause distraction (i.e. delayed responses) unrelated to emotional content. However, such regions or alternative sources of salience are not the focus of this study as our operational definition of salience is restricted to the emotional domain.
In principle, brain regions eligible for Salience Attribution may show stronger or weaker fMRI signals for negative as opposed to neutral stimuli. Regions of the 'negative < neutral' fMRI contrast would be expected to show a negative correlation with RTs, because less signal for negative stimuli would indicate longer RTs. However, we are restricting our analysis here to the 'negative > neutral' brain network, because our previous study (Marxen, Jacob, Hellrung, Riedel, & Smolka, 2021) did not show any brain regions that were negatively correlated with RT. In fact, we confirmed this finding in the current data and found no early Salience Attribution regions within the negative < neutral fMRI contrast. Thus, it appears that the salience attribution process in the brain is dominated by increased activity for stronger distractor stimuli. In turn, stronger activation for negative distractors is in line with a model that such stimuli recruit more processing resources, which delay the execution of other tasks.

Regions for Salience
Attribution should carry information about the RT within a trial, which is our behavioural measure of distractor salience. We emphasized in our previous work (Marxen et al., 2021) that, while it may be plausible that brain regions such as the amygdala that show differential fMRI (and electrophysiological) signals for negative versus neutral stimuli are responsible for the RT distractor effect that is observed in group studies, the actual evidence for such a claim is weak as long as no trial-by-trial correlations between brain activity and behavior can be observed. Neuroscientific models of attentional capture remain speculative, because it is entirely possible that certain brain regions may be exquisitely sensitive to differences in particular image features (such as features related to emotional valence), but are not pivotal for the attribution of salience and recruitment of cognitive processing resources. However, collecting evidence of trial-by-trial correlations between brain activity and behavioral measures such as RTs can substantiate any claim that brain regions are pivotal for salience attribution. Such a trial-by-trial correlation is implemented in our model by a neural event regressor with an amplitude that is proportional to the trial-specific RT deviation from the mean RT across the entire run. This parametric modulator GLM approach has been established in fMRI long ago (Buchel, Holmes, Rees, & Friston, 1998) and has been employed in our last paper (Marxen et al., 2021). However, even trial-by-trial correlations with RT are not yet sufficient to prove the causal role of a particular region in mediating the distractor effect. This is due to the fact that RTs in cued tasks are generally variable and that regions that track RT may be related to the execution of the motor response rather than to the processing of the distractor. Thus, a further requirement for causality is temporal precedence of the causal event over the caused event.
3. Regions for Salience Attribution should respond to the stimuli early in the processing chain, as they need to influence the subsequent distribution of cognitive resources. This is where CTMC offers a major advance over the existing fMRI analysis methods. Note that the whole cognitive process occurs in approximately 500ms, which is short in comparison to the width of the HRF and the sampling interval of TR = 0.756s. Thus, neural events are modeled ( Fig. 1) as stick functions here as opposed to rectangular functions. 'Early' in this context becomes synonymous with time-locked to the onset of the distractor stimulus rather than to the motor response because of the principle nature of the task. This principle nature involves a cued RT task with an additional early causal effector, such as emotional image valence, that influences the RT. Such a task design is one of the major tools of experimental cognitive psychology. Importantly for CTMC, RTs vary on a time scale (std ~= 70ms) that cause a sufficiently variable shift (jitter) in the BOLD fMRI responses to differentiate signals that are time-locked to the effector onset, i.e. the beginning of the cognitive process, from those time-locked to the motor reaction, that marks the end of the cognitive process. Our data show that such a temporal differentiation is not only replicable but also largely in line with existing knowledge on brain organization (see below). Importantly, this differentiation is not possible in the conventional fMRI GLM framework. Adding an additional motor regressor into the GLM model on top of the distractor onset regressor is not feasible, because the vascular response is broad (~5s) and would introduce strong co-linearity between regressors that are separated by hundreds of ms only, which would in turn render fitting the data unstable. In addition, GLM models may be extended by the temporal derivative of a condition in order to measure the latency of the hemodynamic response (Calhoun, Stevens, Pearlson, & Kiehl, 2004). We actually considered such temporal derivatives for all models to investigate whether these would improve the models and found that this was not the case for most of the brain (see Fig. S4). 85% of all brain voxels are assigned the same model or in the not-replicable category in both comparisons. This observation indicates that model selection between early and late models was not driven by the mean delay of the response with respect to the distractor onset (~[400+200]ms, see Fig. 1) but rather by the variability of the RT. This observation is crucial, because knowledge of the hemodynamic latency with respect to the distractor onset would not have allowed us to draw conclusions about relative neural latencies, as discussed in the main paper. Our results show that model comparison (CTMC), in contrast to a GLM model with additional regressors, does have the necessary sensitivity to separate a cognitive process into an early and a late phase. Thus, CTMC increased the temporal resolution of fMRI to time scales relevant for cognitive processes without being vulnerable to misinterpretation due to hemodynamic delays. The insula region is particularly sensitive to such misinterpretations because data by Chang et al. ((Chang, Thomason, & Glover, 2008) and personal communications) indicate shorter latencies of the hemodynamic response in the insula compared to most of the cortex.
This temporal resolution of CTMC is not yet on par with electrophysiological measures that are capable to resolve timing difference on the order of neural conduction times between directly connected neurons (~10ms, (Carretie, Yadav, & Mendez-Bertolo, 2021)). However, it is not obvious that such a high resolution is required to identify the role of local neuronal ensembles (as measured by all non-invasive neuroimaging techniques) because such ensembles remain active for hundreds of milliseconds, are interconnected locally, and receive additional inputs from other regions that may be crucial for subsequent cognitive processes such as the recruitment of attention (Gothard, 2020). Such interactions would require additional time. The question, for example, when exactly the brain is capable to distinguish emotional from nonemotional signals (Carretie et al., 2021) may not be very helpful to understand the neurocognitive process of attention capture for two reasons. First of all, the information on valence is already there when an image is projected onto the retina, i.e. this information could theoretically be extracted by a measurement device. Secondly, the pure classification of a stimulus may not yet be synonymous with the attribution of salience as it is conceivable that the same stimulus may be more or less salient, i.e. distracting, dependent on other factors like the momentary brain state of the subject as the stimulus is perceived. One brain state may lead to the recruitment of attention and cognitive resources, another not. Thus, it is not sufficient for a Salience Attribution region to merely differentiate negative from neutral images, but the prediction of RTs is crucial.
The association of a particular cognitive process with a specific fMRI model is often not trivial. The yellow cluster in the visual system in Fig. 2, for example, indicates a late activation that we assigned to the Task Execution process. But it is not clear whether this association is really appropriate given that the signal may, for example, also be related to a deeper processing of the distractors images rather than to an evaluation of the task cues.
In general, we cannot observe cognitive processes directly but only infer them from observations. CTMC allows us to test related hypotheses that include process timing; but, the interpretation of the data in terms of cognitive processes remains challenging and will still require converging evidence from multiple experimental manipulations. Additionally, we cannot rule out the possibility that certain brain areas (e.g., the amygdala) are still involved in certain cognitive processes (i.e., Salience Attribution), even though the fMRI signal does not indicate this.

Supplementary Results
Behavioral results are summarized in Table 1 for RTs and Table S1 for error rates for Study A, Study B1 and Study B2. Further details of the fMRI model comparison are provided in Fig. S1-S5. Fig. S1 is a more detailed version of Fig. 2 of the main manuscript and will be explained in more detail below. Fig. S2 is a mosaic version of the replicably-winningmodel map used in Fig. 2 and Fig. S1 and provides additional sagittal views of the model comparison results within the negative > neutral distractor ROI. Fig. S3 is a whole brain version of this replicably-winning-model map. Figs. S4 and S5 are comparing the wholebrain replicably-winning-model map from Fig. S3 with an equivalent map for an 8-model comparison, which included the original four models from the main paper plus four additional models with temporal derivatives, which mimic latency shifts of the HRF. Digital versions of the 3D winning model maps, a manual for easy viewing of these maps using MRICroGL (https://www.nitrc.org/projects/mricrogl/), and two (ROI and whole brain) Excel tables, which contain information about each cluster and how much it overlaps with a particular atlas region, are provided in the Suppl. Data S1 at https://osf.io/kqz6a/ as part of the Open Science Foundation project "SFB940/1-A7: Volitional Control of Brain Activity: Effects of Neurofeedback on Emotional Reactivity" (https://osf.io/6afq5/). Fig.  S6 and Table S2 show the results of the replication of Study A in Study B1.

Explanation of Fig. S1
While the ACES task served in the main manuscript primarily as an application example of the CMTC method, the cognitive process of attention capture is a very important human capacity and intensely studied by experimental psychologists and cognitive neuroscientists. Thus, we will discuss our results and their implications in more detail here, going beyond the amygdala and the anterior insula already explained in the main manuscript. We will still focus on regions that are part of the 'negative > neutral' distractor contrast (see Fig.  S1).
First, we would like to focus on the early visual pathway. It is well known that visual information is passed from the eye via the optic nerve to the 'visual' thalamus, and subsequently via the optic radiations to the primary visual cortex (cluster 25) (Carretie et al., 2021). In their latest review, Carretie et al. (Carretie et al., 2021) emphasize the potential role of the visual thalamus, specifically the lateral geniculate nucleus (LGN), the pulvinar and the thalamic reticular nucleus (TRN), as initial evaluation structures (IES). The authors argue that cortical regions such as the amygdala, insula and ventral prefrontal Cortex (vPFC) are responding too slowly to explain early electrophysiological response differences between valence classes in the visual cortex at 80ms. Amygdala, insula and vPFC show such differences at the earliest at 74ms, which would be too late to transmit the information to the visual cortex in time.
LGN responses, in turn, show response differences already at 20-40ms. Consistent with this prior knowledge, the thalamus (cluster 15) displays the features of an early Distractor Perception region in our current analysis. However, our operational definition of salience attribution requires a measurable prediction on behaviour (i.e., RTs). In this regard, we find no evidence in our winningmodel map that such attribution already occurs in the thalamus. In fact, neither the primary visual cortex (cluster 25) nor secondary visual/attention regions (clusters 50 and 5) show such predictive powers. It is conceivable that the necessary amplification of emotional stimuli to cause distraction with measurable behavioral differences requires the involvement of higher order cortical regions such as the dAI.
Second, the replicably-winning-model map associates particular subregions of the negative > neutral contrast ROI with one of the four fMRI models (see Data S1 for a complete list).
Regions that show Distractor Perception properties include the dorsal attention network regions fundus of the superior temporal gyrus (FST) and the inferotemporal areas PH and TE2 (cluster 5) as well as the frontal and premotor eye fields (cluster 13), the LB amygdala (cluster 14), the salience network regions anterior insula and area 55b (cluster 13) and default mode network regions close to the insula region (cluster 13) and in the orbitofrontal cortex (area 10v, cluster 10).
Within the ROI, only the salience network regions dAI (cluster 41) and SMA (cluster 48) and the control network region sIFG (area 6, cluster 47) are predictive of RT, i.e. considered early Salience Attribution regions. Importantly, other regions modulated by RT such as the default mode network region vAI (cluster 97) and the control network region 8BM (cluster 106) are only later activated Modulated Task Execution regions. Note that the SMA region (cluster 48) actually converts largely from the Salience Attribution model to the Modulated Task Execution + Derivates model when temporal derivatives are additionally included (see Suppl. Data S1 and Fig. S4&5). Thus, only dAI and sIFG remain as the core salience attribution regions. In addition to the dAI as a pivotal salience attribution region, it is highly interesting that the sIFG also shows Salience Attribution properties. This region is rarely considered in this context and deserves more attention in future investigations.
A number of regions close to Early Perception regions are closer related with the motor response, i.e. considered later activating Task Execution regions, such as the visual network region FFC (cluster 50), the dorsal attention network area TF (cluster 50), the limbic area 10v (cluster 56), the default mode area 9 (cluster 56) and the SF amygdala (cluster 61). The spreadsheet file cluster_stats_atlas_ROIs_neg_larger_neu_dense.xlsx of the Suppl. Data S1 (https://osf.io/kqz6a/ -> subdirectory \MRICroGL_Files\Winning_Model_Maps\) contains a detailed list of atlas regions for each cluster of the winning-model map.
Third, we would like to elaborate on the results for the amygdala and insula, which are considered key regions for attentional capture by emotional stimuli. In addition to confirming the anterior insula as a salience attribution region, the observed functional segregation of the anterior insula signal in Fig. 3 is in line with insula cytoarchitectonics (Augustine, 1996;Mesulam & Mufson, 1982). It is also consistent with fMRI studies linking vAI and dAI to distinct networks and functions, respectively. For example, Deen et al. (Deen, Pitskel, & Pelphrey, 2011) found that the vAI was closely linked with pregenual ACC (pACC) and the default mode network (DMN). Neural activity in dAI, on the other hand, was primarily correlated with the dorsal anterior cingulate cortex (dACC). Notably, dAI and dACC anchor the salience network (SN) and relate to the cognitive control network (for a review see also (Uddin, 2015)). Interestingly, examination of the insula using yet another approach, namely quantitative modelling of multi-shell diffusion MRI data, yields the same organizational differentiation of this region (Menon et al., 2020). Thus, the CTMC results presented here are clearly consistent with previous parcellations of the insula into dorsal and ventral subregions. Furthermore, our results match with prior knowledge on the functional relevance of the insula in salience processing and extend this knowledge to include the temporal sequence of events.
For the amygdala, our finding (see Fig. 4) were less anticipated from the existing literature; they rather complement existing knowledge on amygdala subdivision. First of all, it is to be noted that the negative > neutral distractor ROI covers only 23% of the left amygdala and 22% of the right amygala and even less of the LB (< 10%), the by far largest subdivision, using the probability maps of the Anatomy Toolbox by Amunts et al. (Amunts et al., 2005) (see Suppl. Data S1). Activation (negative > neutral) is found in all subdivisions in both hemispheres as might be expected from the well-known emotionrelated functions of the amygdala (Davis & Whalen, 2001;Gothard, 2020;Phan, Wager, Taylor, & Liberzon, 2002;Phelps & LeDoux, 2005) and also by the view that the amygdala serves as a multi-dimensional classifier potentially without strict categorical subdivisions (Gothard, 2020). Within all subdivisions, we confirmed our previous results (Marxen et al., 2021) that the amygdala signal is not associated with RT; only Distractor Perception and Task Execution components were found. Interestingly, the Distractor Perception components dominated the activity in the right latero-basal (LB) amygdala (8 of 10 voxels) only. All other subdivisions of the amygdala were dominated by the later Task Execution component. Past studies have consistently found distinct function and functional connectivity in these sub-regions (Bzdok, Laird, Zilles, Fox, & Eickhoff, 2013;Caparelli et al., 2017;Kim et al., 2011;Roy et al., 2009). Hemispheric lateralization in amygdala function is also known (Baas, Aleman, & Kahn, 2004;Fruhholz et al., 2015;Gainotti, 2020;Ji & Neugebauer, 2009;Palomero-Gallagher & Amunts, 2021;Zhang et al., 2018). Our finding supplements the previous notion that the LB is primarily associated with significance detection in high-level sensory input, CM with mediating attentional and motor responses as a main output centre, and SF with olfactory and social information processing (Bzdok et al., 2013). The findings suggest that LB, specifically in the right hemisphere, is dominated by inputs from primary visual regions (such as the visual thalamus). Also note that amygdala parcellations vary depending on the data used [for example: cytoarchitectonics (Amunts et al., 2005), functional connectivity (Roy et al., 2009), structural connectivity (Bach, Behrens, Garrido, Weiskopf, & Dolan, 2011)] or the methods used [for example: recurrence quantification analysis (Bielski, Adamus, Kolada, Raczaszek-Leonardi, & Szatkowska, 2021), plausibility guided hierarchical clustering (Klein-Flügge, 2020), semi-supervised spectral clustering (Zhang et al., 2018)]. This variation may explain the low overlap of Amunts' LB amygdala ROI with the Neg > Neu contrast in our data, although a stronger involvement of the LB nuclei might be expected.
In summary, Fig. 2 and S1 demonstrate that CTMC results are congruent with numerous previous findings on brain functions and parcellations and offer additional insights related to the timing of cognitive processes.   Fig. 2 and S1 for sagittal slices for X = -70 to +72 mm in MNI space. In Study B, the four models explained in Fig. 1 were compared within the negative > neutral distractor ROI from Study A. White regions were not replicated between Study B1 (N=40) and B2 (N=33). 65% of all voxels within the ROI were replicated.  Fig. S2 but for the whole-brain mask. 64% of all voxels within the whole-brain mask were replicated. Fig. S4. Comparison of replicably-winning-model maps (MNI X = -68 to +68 mm) for the 4-model comparison as in Fig. S3 (left) and the 8-model comparison that includes temporal derivatives (right). 85% of all brain voxels are assigned the same model (or not-replicable). Replicability for the 8-model comparison on the right is 61% as compared to 64% on the left. Models with derivatives are winning replicably in less than 2% of all brain voxels.   (1) Fig. S6 C & D). A cluster threshold of p < 0.001 and a small volume correction with pFWE < 0.05 using a bilateral amygdala and bilateral anterior insula mask was applied to (I) the contrast "Negative > Neutral Distractor Images" of the Valence Model and (II) the parametric contrast "Positive Correlation with RT" for the RT Model. Clusters covering part of the amygdala are marked with "*". This